Pivoting approaches for bulk extraction of Entity-Attribute-Value data

نویسندگان

  • Valentin Dinu
  • Prakash M. Nadkarni
  • Cynthia Brandt
چکیده

Entity-Attribute-Value (EAV) data, as present in repositories of clinical patient data, must be transformed (pivoted) into one-column-per-parameter format before it can be used by a variety of analytical programs. Pivoting approaches have not been described in depth in the literature, and existing descriptions are dated. We describe and benchmark three alternative algorithms to perform pivoting of clinical data in the context of a clinical study data management system. We conclude that when the number of attributes to be returned is not too large, it is feasible to use static SQL as the basis for views on the data. An alternative but more complex approach that utilizes hash tables and the presence of abundant random-access-memory can achieve improved performance by reducing the load on the database server.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality Impact of Value Matching and Scoring in Top-k Entity Attribute Extraction∗

The entity attribute extraction problem, or how to extract entities and their attribute values from natural language Web documents, is of critical importance for Web search and information access in general. Unfortunately, because of the noisy nature of theWeb and its scale, entity attribute extraction is notoriously challenging in terms of both extraction efficiency and quality. In our earlier...

متن کامل

Attribute Extraction from Product Titles in eCommerce

This paper presents a named entity extraction system for detecting attributes in product titles of eCommerce retailers like Walmart. The absence of syntactic structure in such short pieces of text makes extracting attribute values a challenging problem. We find that combining sequence labeling algorithms such as Conditional Random Fields and Structured Perceptron with a curated normalization sc...

متن کامل

Optimized Entity Attribute Value Model: A Search Efficient Re- presentation of High Dimensional and Sparse Data

Entity Attribute Value (EAV) is the widely used solution to represent high dimensional and sparse data, but EAV is not search efficient for knowledge extraction. In this paper, we have proposed a search efficient data model: Optimized Entity Attribute Value (OEAV) for physical representation of high dimensional and sparse data as an alternative of widely used EAV. We have implemented both EAV a...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

بدیلی برای اف‌.آر.بی.آر؟

Purpose: The aim of this article is to propose an alternate for F.R. B.R. Methodology: The methodology is based on library investigation and Web searching. Findings: In this article every bibliographical entity is studied from eight approaches: the first is ontological one which deals with three equal - valued elements with which the entity comes into being. They are author (corporate body), ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer methods and programs in biomedicine

دوره 82 1  شماره 

صفحات  -

تاریخ انتشار 2006